# Swin-BART Architecture
Donut Receipts Extract
A specialized receipt text extraction model based on the Donut architecture, achieving OCR-free document understanding through visual encoder and text decoder
Image-to-Text
Transformers

D
AdamCodd
66
34
Uae License Detection
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images
Image-to-Text
Transformers

U
codedrainer
21
2
Donut Base Finetuned Invoices
Multilingual invoice processing model optimized based on Donut architecture, capable of extracting key invoice fields
Image-to-Text
Transformers

D
to-be
823
21
Donut Base Finetuned Zhtrainticket
MIT
Donut model fine-tuned on ZhTrainTicket for document image-to-text conversion without OCR processing.
Image-to-Text
Transformers

D
naver-clova-ix
362
0
Donut Base Finetuned Cord V1 2560
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder with a text decoder to achieve image-to-text conversion.
Image-to-Text
Transformers

D
naver-clova-ix
30
1
Donut Base Finetuned Docvqa
MIT
Donut is an OCR-free document understanding Transformer model, fine-tuned on the DocVQA dataset, capable of directly extracting and comprehending text information from images.
Image-to-Text
Transformers

D
naver-clova-ix
167.80k
231
Donut Base Finetuned Rvlcdip
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images.
Image-to-Text
Transformers

D
naver-clova-ix
125.36k
13
Donut Proto
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder for image-to-text conversion
Image-to-Text
Transformers

D
naver-clova-ix
30
7
Donut Base
MIT
Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART).
Image-to-Text
Transformers

D
naver-clova-ix
50.34k
207
Donut Base Finetuned Cord V2
MIT
Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART), capable of directly extracting text information from images.
Image-to-Text
Transformers

D
naver-clova-ix
21.63k
97
Featured Recommended AI Models